---
title: 'Information Technology and Political Engagement: Mixed Evidence from Uganda'
author: "Guy Grossman, Macartan Humphreys, Gabriella Sacramone Lutz\\footnote{Support for this research was provided by National Science Foundation Grant number 1260631. Supplementary material for this article is available in the appendix in the online edition. }"
date: "March 24, 2019"
header-includes: 
  \usepackage{natbib}
  \usepackage{caption}
  \usepackage{booktabs}
  \usepackage{setspace}\onehalfspacing
output:
  pdf_document:
    number_sections: yes
    keep_tex: yes
    # pandoc_args: ["--include-after-body=20180319_appendix.tex"]
  html_document:
    # pandoc_args: ["--include-after-body=20180319_appendix.md"]
abstract: "This study integrates three related field experiments to learn about how Information Communications Technology (ICT) innovations can affect who communicates with politicians.  We implemented a nationwide experiment in Uganda following a smaller-scale framed field experiment that suggested that ICTs can lead to significant ``flattening'': marginalized populations used SMS-based communication at relatively higher rates compared to existing political communication channels. We find no evidence for these effects in the national experiment. Instead, patterns participation rates are extremely low and marginalized populations engage at especially low rates. We examine possible reasons for these differences between the more controlled and the scaled up experiments. The evidence suggests that even when citizens have issues they want to raise, technological fixes to communication deficits can be easily undercut by structural weaknesses in political systems."
keep_tex: TRUE
bibliography: bib.bib
---

\pagenumbering{gobble} 

\newpage

# Information Technology and Political Engagement: Mixed Evidence from Uganda {-}

**Abstract**

This study integrates three related field experiments to learn about how Information Communications Technology (ICT) innovations can affect who communicates with politicians.  We implemented a nationwide experiment in Uganda following a smaller-scale framed field experiment that suggested that ICTs can lead to significant "flattening": marginalized populations used SMS-based communication at relatively higher rates compared to existing political communication channels. We find no evidence for these effects in the national experiment. Instead, patterns participation rates are extremely low and marginalized populations engage at especially low rates. We examine possible reasons for these differences between the more controlled and the scaled up experiments. The evidence suggests that even when citizens have issues they want to raise, technological fixes to communication deficits can be easily undercut by structural weaknesses in political systems.

March  24, 2019
\newpage

\pagenumbering{arabic} 

```{r setup0, include=FALSE}

rm(list=ls())

knitr::opts_chunk$set(echo = TRUE)
set.seed(1)

pks <- c("gridExtra", "grid", "lattice", "cowplot", "readxl", "magrittr","plyr","dplyr", "dtplyr","foreign","ggplot2","knitr", "MASS", "multiwayvcov", "plm","reporttools", "Rmisc", "stargazer", "systemfit", "xtable", "data.table", "zoo", "readstata13",  "MASS","sandwich", "lmtest", "AER", "markdown", "tidyr", "readstata13", "dummies", "estimatr")

invisible(lapply(pks, function(x) if (!require(x, character.only=T)){install.packages(x,  repos="http://cran.us.r-project.org");require(x)}))

library(estimatr) 
library(knitr)

output_folder <- "02_figures"

local_data <- TRUE



```

```{r setup1, message=FALSE, warning=FALSE, include=FALSE}


if(local_data ){
  citizens        <- suppressWarnings(read.dta("01_data/CitizensEndline.dta", convert.factors = F) )
  caseload        <- read.dta("01_data/Caseload.dta")
  caseload_month  <- read.dta("01_data/CaseloadMonths.dta")
  case_post       <- read.table("01_data/Fullcaselistraw.csv",  fill = TRUE, header=TRUE, sep=",")
  voters          <- read.dta("01_data/Voters.dta", convert.factors = FALSE)
  results         <- read.csv("01_data/Parish2010/presidential_results_PS11.csv")
  callback        <- read.dta("01_data/Callback.dta", convert.factors = F)
 # exp_results_old <- read.dta("01_data/new experiment/selected_treatment.dta")
  exp_results     <- read.dta("01_data/new experiment/ExpResults.dta")
  sms             <- read.csv("01_data/20180122_caseload_retrieved_eacode.csv")
  MP              <- suppressWarnings(read.dta13("01_data/MP data/output/MP_data.dta"))
  ffe             <- suppressWarnings(read.dta13("01_data/GHS_Replication_NEWBLOCKS.dta"))
  endline_sample  <- suppressWarnings(read.dta13("01_data/sample.dta"))
}


```

```{r prep, echo = FALSE}

citizens  %<>% 
             filter(uspeak_sort %in% c(0,1)) %>% # subset to control and treatment sample
             mutate(IPW    = uspeak_sort*(1/PROB) + (1-uspeak_sort)*(1/(1-PROB)),
                     uspeak = factor(uspeak_sort, labels=c("Control","Treat")),
                     # Recode influence vars into binary binary
                     q67_a_influe_pres_REC2 = (q67_a_influe_pres_REC1 > 2)*1,
                     q67_b_influe_mps_REC2 = (q67_b_influe_mps_REC1 > 2)*1,
                     q67_c_influe_lc5_REC2 = (q67_c_influe_lc5_REC1 > 2)*1,
                     q67_d_influe_clan_REC2 = (q67_d_influe_clan_REC1 > 2)*1,
                     q67_e_influe_kingdom_REC2 = (q67_e_influe_kingdom_REC1 > 2)*1)

# fix D_PROB discrepancy for same elec area
citizens$D_PROB[citizens$eacode==67] <- median(citizens$D_PROB[citizens$eacode==67])

```

\doublespacing

# Introduction {#S1}

Weak political communication channels characterize many developing countries. Traditional aggregators of interests, such as civil society organizations, labor unions and political parties, have limited reach, and regular, high-frequency, public opinion polls are all but non-existent. Many citizens have only limited opportunities to directly communicate with politicians, usually around elections, and only very few are willing to bear the high costs of reaching out to representatives to articulate preferences outside electoral cycles. Specifically, constituents do not invest in articulating preferences if they doubt that government officials would be responsive to citizen demands.^[A low sense of (external) efficacy may be especially prevalent where governments have low capacity and/or low levels of legitimacy [@craig1990political]. A sense of (internal) efficacy---i.e., the belief that one has the personal ability to participate effectively in politics [@niemi1991measuring]---can be especially weak for marginalized populations, whether defined by gender, education, wealth or partisanship [@coleman1976structural]. Although voting rates are sometimes higher among the poor [@kasara2014rich] and less educated [@croke2015deliberate], this often reflects differences in mobilization (or repression) in different contexts and does not extend immediately to other types of political engagement.] Such weak sense of political efficacy is compounded by the high cost of traditional forms of political communication---e.g., traveling large distances to meet public officials in person---that further reduce citizens' incentive to proactively reach out to politicians in order to articulate interests, priorities, needs and  preferences. 

Weak political communication channels may have important implications for the health of a country's democratic institutions:  with poor information on their constituents' preferences and policy priorities, elected representatives have a hard time representing interests, and political parties cannot differentiate themselves in meaningful ways [@BleckWalle2012]. When parties are non-programmatic, the accountability relationship between office holders and voters can narrow down to local clientelistic exchange [@stokes2013brokers]. The starting point of this study is that strengthening weak political communication channels offers a promising way to begin improving political representation.

In this paper, we report findings from a multi-year research project (involving three related field experiments) designed to test whether innovations in information communication technologies (ICTs) can be harnessed to improve political communication in low-income countries. Since the existence and costs of new ICT platforms are likely correlated with features of a political system that may independently determine political engagement, assessing the effects of technological innovations on political communication is fraught with difficulties. To overcome this identification challenge, we partnered with the \href{http://www.parliament.go.ug/new/index.php/about-parliament/parliamentary-news/100-speaker-launches-uspeak}{Parliament of Uganda} and the National Democratic Institute (\href{https://www.ndi.org/uganda_uspeak_launch}{NDI}), an international non-government organization (NGO), to implement one of the largest field experiments involving political elites to date.

Our primary experiment examines a nationwide Parliament-led program that introduced a new channel for contacting elected representatives. In the terminology of @HarrisonList2004, this experiment is a "natural field experiment" (NFE), implemented as part of the political process in Uganda. The program established and subsidized a mobile-technology platform for political communication with the goal of increasing and diversifying citizen voice. Citizens in over 100 treatment constituencies were able to communicate with their Member of Parliament (MP) by sending text-messages. The design involved experimental variations in the *price* of messaging as well as encouragements that communicated the usage by others ("feedback"). In principle these allow us to assess how beneficial effects of communication systems depend on design, as well as helping us to learn about the logics of constituents' decision-making.   

MPs representing treated constituencies could respond to messages via the platform and use the system's functionalities to aggregate messages and to learn about usage patterns over time. The ICT platform was introduced to voters via twice daily short radio ads in nineteen national languages over a six-months period. This experiment is unusual in scale---the program involved about 10 million voters---but also in nature: change in access was led by political elites and thus provided a relatively strong invitation to citizens to engage in politics. 

The design allows us to examine four primary questions. First, what are  the overall levels of political engagement though this ICT channel. Second, we assess whether this channel plausibly flattens citizens' access to MPs. Third, in line with past work, we also assess whether the costs of communication (price) matters---in principle flattening effects could be strengthened or weakened he decision to send text-messages through cost considerations. Fourth we examine, *downstream* effects, recognizing that citizens' attitudes---especially their sense of efficacy and trust in government---could be affected by the introduction of a communication technology even if they do not elect to use it.^[These dimensions are derived, in part, from a simple theoretical model found in the supplementary material, in which we explore implications arising from the fact that politicians might be both biased against certain groups and lack information about citizens' priorities. Though the basic logic that politicians respond more effectively to citizens when better informed is simple enough, the implications of this for what sorts of citizens are more likely to bear costs to inform politicians can be quite complex.  Our model thus highlights the kinds of theoretical ambiguities that can arise in this context.] 

The results of the nation-wide field experiment are disappointing: system usage in treatment constituencies was low, and marginalized populations largely refrained from using the ICT platform. Pricing did not seem an important consideration and evidence for downstream effects was weak at best. In fact, because of the disappointing level of citizen engagement and revealed---as compared to stated---low interest among Members of Parliament (MPs), the Ugandan Parliament ultimately decided to phase out the SMS service. 

Strikingly, these disappointing findings differed markedly from findings from a more controlled experiment---in the terms of @HarrisonList2004, a "framed field experiment" (FFE)---undertaken before the national program was rolled out and reported in @grossman2014wld. Results from the FFE suggested a relatively high demand, and that mobile technology could democratize political communication because marginalized constituents were willing to engage at relatively high rates and were not more price sensitive, compared to less marginalized voters. By contrast, the NFE found little citizen involvement and no improvement in *differential* access to political elites. 

In the second part of the paper we take advantage of differences between the NFE and the FFE to explore the reasons for the disappointing findings in the national experiment. Since both experiments were implemented using subjects from constituencies across Uganda, they involved similar populations, eliminating common external validity concern---that replications tend to fail because of unobserved features of the experimental subject pool [@allcott2015site].We find that a large part of the explanation hinges on the ability of government to *reach* citizens to engage them in this kind of innovation. In particular our findings cast doubt on the utility of using short radio ads to elicit wide-scale participation.^[Plausibly, radio programming may be effective even if radio ads are not [@yanagizawa2014propaganda; @adena2015radio].] With that said, we still estimate that about 1 million Ugandans received the encouragement to contact officials. It is therefore possible that differences are due (also) to pure scale effects. We assess this possibility by looking for evidence of strategic substitution, exploiting exogenous variation in feedback on system usage.^[Relatedly, see @ferrali2018peer who explicitly model messaging politicians as subjected to positive externalities, which is appropriate when feedback can facilitate voter coordination [@arias2017malfeasance]. Substitution are also possible if free rider logics are in operation.] We do not, however, find evidence in support of this explanation. Instead, we find relatively strong evidence that voters doubt the efficacy of contacting their MP directly--partly counteracted through direct invitations of the form present in the FFE--and suggestive evidence that larger (structural) inequalities prevented the ICT program from having effects at scale. 


This paper makes several contributions to the literature on political communication, and especially to our understanding of inequalities in political participation. We highlight how the underlying willingness to engage in politics---even when using low-cost impersonal communication channels---crucially depends on citizen beliefs about the effectiveness of political engagement, which itself likely depends on politicians' response to incoming messaging.^[For recent studies making similar claims, see @sjoberg2017effect, @christensen2017elections and @GrossmanMichelitchSantamaria2015.] Though not identified, we provide below evidence that the usage of the system was tightly connected to MP's (in)action. ICTs, we argue, in and of themselves, do not make non-responsive politicians responsive. 

The paper also contributes to a growing literature on the effectiveness of using ICT innovations to improve governance outcomes [@peixoto2017civic]. Past studies have focused on using ICTs to reduce absenteeism among frontline service providers [@duflo2012incentives;@grossman2018crowdsourcing], improve election integrity [@callen2014institutional], increase engagement in local affairs [@buntaine2017can], and report corruption [@BlairLittmanPaluck2015] and violent incidents [@van2015crowdseeding]. Ours is the first study to examine the role ICTs may play in altering citizen-MPs relationships in the context of low-income countries.   

Our study also contributes to ongoing methodological debates on the utility of relatively small-scale controlled experiments, such as the framed field experiment described here (and, a fortiori, "artefactual" field experiments or lab experiments), in shedding light on core political processes. Most field experiments---including many natural field experiments---are implemented on a small scale but seek to make claims about large-scale processes. For example, small-scale experiments may be used to test new approaches, be designed as a proof of concept, or test micro-logics that arguably underlie general features of human behavior. Indeed, much of the "credibility revolution" in the study of international development is premised on the idea that small-scale field experiments can create a body of knowledge that allows promoting "what works" and eliminating programs and policies that do not [@banerjee2009experimental]. Yet, it is often contestable whether the results of small-scale field experiments can accurately inform theory or form the basis for more general policy [@Manski2013]. Our study distinguishes between explanations for when and why such inferences may not be valid and garners evidence for or against these different explanations.

In the remainder of this paper we introduce the research questions that the different field experiments were designed to answer and present the design and results from the scaled-up national program. We then present analyses designed to assess mechanisms that could account for differences in outcomes. Our conclusions focus on the implications for efforts to democratize political communication, and on the implications for learning about political processes from controlled experiments. 
# Research Design {#sec:design}

The field experiment we study was part of the national strategy of Uganda's Parliament for widening citizen voice. To the best of our knowledge, it is one of the largest political field experiments ever to be undertaken with consenting political elites.^[Our study joins a growing body of work using politicians as experimental subjects. See for example, @sheffer2018nonrepresentative and @leveck2014role on politicians' decision making, and @GrossmanMichelitch2016 for politicians' response to disseminating information on their performance in office. Such studies can raise ethical concerns insofar as they involve interventions in democratic processes. In this case, however, we highlight that the initiative was with the explicit consent of politicians and was owned by the Parliament of Uganda, operating through the parliament's website, with the intention of strengthening democratic processes. The uSpeak program had no deception of any form.] Below we describe the political context that gave rise to this intervention---summarizing results from the framed field experiment implemented prior to the national intervention---and describe the design of the national intervention and the data used to study it.  

## Political Context {#S3}

Uganda provides a good context for exploring changes to behavior in the wake of introducing a new political communication platform. First, Uganda shares characteristics with many low-income countries on relevant dimensions. It is in the mid-range of the World Bank's low-income economies in terms of economic development (as captured by GDP per capita) and of human development (as captured by HDI ranking).^[Low-human development countries are ranked between 148 (Swaziland) and 188 (Central African Republic). Uganda is ranked 163 (in 2016).] In addition, Uganda is in the middle range ICT ownership, use and access among African countries [@DigitalDivides]. These factors strengthen confidence in the external validity of our results. 

Second, data from Uganda supports the assumption of weak political communication channels leading to dearth of information in the hands of politicians. Consider results culled from a survey the research team conducted with Ugandan Members of Parliament at baseline. We find that the majority of surveyed MPs describe themselves as feeling insufficiently informed when they vote in plenary and in committee meetings. In other work, surveyed Ugandans report that elected politicians do not frequently elicit voter opinions [@GrossmanMichelitchSantamaria2015]. This evidence suggests that the context is one in which there is an unmet demand for greater information. 

Third, results from the framed field experiment (FFE) conducted prior to the launch of the national field experiment, further point to Uganda as a good context for studying the questions at hand. Specifically, findings from the framed field experiment suggests that not only does there exist underlying demand in Uganda for contacting one's MP via a text-messaging platform, but also that IT communications do not necessarily widen the participation gap between more and less marginalized populations. We briefly describe the FFE below.^[A more detailed description of the FFE can be found in @grossman2014wld.]

The framed field experiment, undertaken in 2011, was delivered alongside a survey conducted in every parliamentary constituency in Uganda using a national representative sample. The FFE sought to assess whether demand existed and to explore the validity of the concern that IT-based communication platforms exacerbate existing inequalities in political access. At the end of the survey, sampled respondents were invited to send a text message to their MP at randomly assigned prices. Discussed in more detail in @grossman2014wld, the communication recorded in the FFE---about 5%---suggests that a sizable number of citizens value the opportunity to contact their MPs via SMS. 
   
In addition, usage rates in the FFE were no lower among more marginalized populations, possibly reflecting the fact that these populations have fewer opportunities to access politicians and therefore place a higher value on impersonal and inexpensive ICT channels. Experimentally manipulating the price of sending a text message to one's MP, we further found, as expected, that reducing the cost of communication encouraged usage.^[Usage was almost 50% higher for those randomly assigned to a free SMS treatment arm, as compared to those assigned to a treatment group that was not offered any subsidy for texting their MP.] Moreover, consistent with the idea that marginalized populations place a higher value on cheap impersonal communication, we found that marginalized populations were not more sensitive to the cost of political communication than less marginalized populations. 

The FFE confirmed that Uganda offers a good context to examine the implications of harnessing technological innovations to improve political communication, and that ICT platforms have a potential to alter citizen-MP relations and "flatten" political access. However, the setup of the FFE also had some limitations. For example, it allowed only a `one-shot' opportunity to communicate with MPs, and thus was unable to examine usage patterns over time, in which citizens' behavior is (also) a function of both the usage of *other* citizens and the response of their MP to past messages. Moreover, it was implemented in the context of an in-person survey in which subjects interacted with enumerators regarding their political views. While such personal interaction ensures that subjects are aware of the program, it is also prohibitively costly. Thus, there is no guarantee that using mass communication channels at scale (such as radio) would result in a strong first-stage; i.e., wide-range program awareness. The personal interaction with enumerators may have also made politics more salient to interviewed subjects, further strengthening the invitation to use the platform. The personalized invitation to contact one's MP may have also increased both the sense of empowerment and civic obligation to raise one's voice. It is also possible that subjects perceived the FFE as closer to a civil society effort than an official government program. 

These considerations raise the question of whether similar effects would be found when the ICT service was brought to scale, and shifted from being a researcher-led initiative to being an institutionalized part of national politics. The field experiment described in the next section was designed to address these concerns. 

## The uSpeak Initiative

As part of the Ugandan Parliament's national strategy, a case management platform hosted in the National Parliament was developed, allowing citizens to send messages to their MP via SMS or a voice call to a call center. MPs randomly assigned to participate in the program ("uSpeak") were given access to the platform and trained in its use. The platform allowed MPs to log onto a dashboard where they could read tagged SMS messages from constituents, reply, and see simple descriptive statistics about the messages they received, such as what the priority issues in their constituency were within a selected time-frame. A screenshot of the query dashboard is presented in the Supplementary Information (Figure 2). Only treated MPs were able to receive messages from their constituents via the case management system.

The ICT platform was promoted to citizens through 30 seconds radio advertisement spots, played twice daily on local radio stations over the study's six-months period. The radio ads were in local languages, and featured a skit where actors portraying constituents talked about how uSpeak could be used to draw the MP's attention to important issues, specifically service delivery deficiencies. These skits were first tested using focus groups. A second tier of randomly assigned treatments---price and feedback---was also delivered via the radio ads.

**Treatment 1: Elite Participation**

The NFE involved 186 MPs who volunteered to be part of a six-month pilot. It was expected that, if deemed successful, all MPs would be phased into the program at the end of the study. Given the sensitivities of providing a new service to only some constituencies, it was agreed that MPs would be selected into the program using a public lottery managed by NDI. Block randomization was used to assign MPs to treatment groups; MPs were sorted into bins based on their *type* (Woman MP or Constituency MP), *party*, and *region*.^[Each bin was used to implement a separate public lottery with a target number of MPs selected into treatment based on that MP type's prevalence in the subject pool. Block randomization was used not simply to improve balance in expectation, but also to improve ex-post equality between parties in participation. See Supplementary Information, Section 3.5 for additional information on the public lottery.] 

**Treatment 2: Variation in Price**

To assess the effects of price on usage, Parliament randomly varied the cost of sending a message to MPs via the uSpeak system, across and within constituencies. Each constituency was assigned 3 months in which uSpeak would be provided free of charge and 3 months without any subsidization. Being sensitive to potential sequence effects, all possible sequences of full price and free months were randomly assigned to constituencies in the treatment group using a blocked design. Note that while the variation in prices in the first period provides a clean separation into price groups, for identification based on variation in subsequent months we must assume no carryover effects.

**Treatment 3: Variation in Feedback**

In order to examine whether information on *others'* usage encourages greater system usage, we added a `feedback' treatment arm delivered through modification of the base radio ads. In one version, voters heard that others had been sending messages to the system about the need to do more in the educational sector. A second variation also highlighted the educational sector but without communicating that others had been using the system to lobby in that area. To the extent that there are complementarities in public goods messaging, we expect that hearing that others are sending messages about education should increase the willingness to contact one's MP. Indeed, our feedback skit was written explicitly in a way that made this sort of complementarity more apparent to radio listeners.^[By contrast, if people view text messages as substitutes, then hearing that others are using the IT system could exacerbate the collective action problem. The feedback ads are shown verbatim in the Supplementary Information, Section 3.3.] We make use of this variation to help unpack reasons for differences in results between studies.   

We selected eight unique price sequences and six unique combinations of the feedback treatment that together produced 48 unique combinations of price and feedback sequences. These were assigned in a balanced way to treatment constituencies, resulting in roughly two constituencies of each unique treatment schedule. In Supplementary Information (Figure 5) we provide an example of treatment schemes for a subset of constituencies.  

## Data

Data for testing the effects of the uSpeak program come from four sources: (1) a baseline survey of Ugandan adults randomly drawn from all constituencies in Uganda, conducted immediately following the 2011 Parliamentary election, (2) the SMS messages sent by constituents to the uSpeak system, tagged with the date and time they were received, (3) a callback phone survey we conducted with uSpeak users, and (4) an endline survey of a nationally-representative sample of Ugandan adults in a subset of the USpeak constituencies. In addition, as described below, we conducted a follow-up experiment with about 3,000 Arua district residents to help adjudicate some of the conflicting findings between the natural field experiment and the framed field experiment. 

# Main Results {#S4} 

We focus on core results related to overall usage, flattening (the characteristics of participating populations), price effects, and downstream effects. We note that usage and flattening are not experimental treatment effects in the usual sense, rather they are levels assessed under controlled conditions. Price and feedback effects draw on randomized variation within treatment and downstream effects draw on randomized MP participation in the intervention, as described above. Analyses implemented to explain our findings on usage are described in Section \ref{S5}.

## Low Rates of Communication


```{r fig1_setup, echo = FALSE, include = FALSE}

# SMSs sent during experiment
case_exp <- dplyr::select(caseload, Datestarted)
case_exp$Date = as.Date(strptime(case_exp$Datestarted, "%d/%m/%Y %H:%M:%S %p"))
case_exp  <- case_exp[case_exp$Date >= as.Date("2012-08-01"),] 
case_exp %<>% dplyr::select( Date, Datestarted)

end <- max(case_exp$Date)

# SMSs sent post experiment
case_post %<>% dplyr::select( Date.started)
case_post$Date = as.Date(case_post$Date.started, "%d/%m/%Y")
case_post$date.dum <- substr(case_post$Date, start=1, stop=1)
case_post$Date[case_post$date.dum==0] <- as.Date(case_post$Date.started[case_post$date.dum==0], "%d/%m/%y")
case_post %<>% dplyr::select(Date, Datestarted = Date.started) %>%
               filter(Date > end) 

# Check datasets don't overlap
min(case_post$Date)
end

# bind together and prep
totalcase <- rbind(case_exp,case_post) 

totalcase <- totalcase[totalcase$Date >= as.Date("2012-08-01"),]

totalcase<- totalcase[order(totalcase$Date), ]

totalcase$seq <- 1:nrow(totalcase)

```

```{r fig1, echo = FALSE, include = FALSE}

pdf(paste0(output_folder,"/post_pilot.pdf"), width = 9, height = 5)
  plot(totalcase$Date, totalcase$seq, type="l", ylab = "Cumulative number of Messages", xlab = "Date")
  
  # ---------- #
   #  Ablines
  # ---------- #
  
  ab_dates <- c("2012-09-01", "2012-10-01", "2012-11-01", "2012-12-01", "2013-01-01", "2013-02-01", 
                "2013-03-01", "2013-04-01", "2013-05-01", "2013-06-01", "2013-07-01", "2013-08-01",
                "2013-11-01", "2013-12-01", "2014-01-01", "2014-02-01", "2014-03-01", "2014-04-01")
  
  sapply(ab_dates, function(x) abline(v = as.Date(x), lty = 3) )
  
  
  
  # ---------- #
   # Polygons
  # ---------- #
  
  start <- c(pilot   = as.Date("2012-08-01"), f2fm = as.Date("2012-09-14"), 
               lindsay = as.Date("2013-04-01"), mp   = as.Date("2013-04-01"), 
               washout = as.Date("2012-11-01"), youth = as.Date("2013-07-31")) 
  
  end   <- c(pilot   = as.Date("2013-02-28"), f2fm = as.Date("2012-09-23"), 
               lindsay = as.Date("2013-07-15"), mp   = as.Date("2013-04-30"),
               washout = as.Date("2012-11-30"), youth = as.Date("2013-08-08"))
  
  colrs <- c(pilot   = rgb(0,1,0,.4), f2fm = rgb(0,.65,.25,0.5),
             lindsay = rgb(1,0,0,.25), mp  = rgb(1,0,0,.5), 
              w_col  = grey(.8) , youth =   rgb(0.7,0.6,1, alpha=.5))
  
  
  sapply(1:length(start),function(i)
      polygon(c(end[i], end[i], start[i], start[i]), c(-1000,5000,5000,-1000), col = colrs[i], border=NA))
  
  # ---------- #
   #   Texts
  # ---------- #
  
  t_date   <- c(plt = "2012-11-15", wout = "2012-11-15", f2fm = "2012-09-17", rmb = "2013-05-25", mp = "2013-04-15", youth1 = "2013-08-01", youth2 = "2013-08-15", youth3 = "2013-08-31")
  t_height <- c(pilot = 500, washout = 2200, f2fm = 2200, remob = 500, mp = 2200, y1 = 2200,y2 = 2200, y3 = 2200)
  t_cex    <- c(pilot = 1.5, washout = 0.7,  f2fm = 0.7,  remob = 1.5, mp = 0.7, y1 = 0.6, y2 = 0.6, y3=0.6)
  t_srt    <- c(pilot = 0,   washout = 90,   f2fm = 90,   remob = 0,   mp = 90,  y1 = 90, y2 = 90, y3= 90)
  
  texts    <- c("Research Period", "Washout Period", "Face to Face Marketing", "Re-Mobilization", "MPs Retrained", 
                "Addition of Youth  MPs", "Distribution of Flyers", "Blast Messagging")
  
  sapply(1:length(texts), function(i)
    text(as.Date(t_date[i]), t_height[i], texts[i], cex = t_cex[i] , srt = t_srt[i]))
  
  
  
  with(totalcase,points(Date, seq, type="l" ))
  
  box()

dev.off()

```





Unlike the FFE described above, system usage in the NFE was very low. Despite twice daily radio ads and price subsidization throughout the country, MPs in the treatment group received a total of `r nrow(caseload) ` messages during the 6-month study period.  Using the most recent 2014 population census, we estimate conservatively that the radio ads were played over an area where 10 million voters reside.  This usage then corresponds to a monthly usage of about 1 in 30,000.  Figure \ref{fig:Uptake}  shows the cumulative messaging over time, extending beyond the study period to show usage in the post-study period including various periods in which an assortment of mobilization efforts were used by Parliament and NDI---none of which produced sustained effects. 

\begin{figure}[!h]
\caption{{USpeak Natural Field Experiment: Usage}}
\centering
\includegraphics[width=1\linewidth]{02_figures/post_pilot.pdf}
\label{fig:Uptake}
\caption*{\small{\textbf{Note}: Cumulative messaging over time. Gray area represents the wash-out period in which no radio spots were played. Green areas denote the period with experimental variation. The figure also shows usage in the post-experimental period, in which there were attempts by Parliament and NDI to further encourage usage. }}
\end{figure}


A broad categorization of the types of messages suggests that, as with the Framed Field Experiment (reported in [authors]), a large share of messages was for local public goods or local community interests with a much smaller set for national or policy concerns; a much larger share of messages here was of a more personal nature, accounting for nearly a half of messages sent compared to at most 10% in the FFE. See Supplementary Information, Table 2.

## No Flattening Effects
  
One of the key findings of the FFE was that the share of marginalized populations---such as women and the poor---among system users was higher than the share of marginalized constituents participating in traditional forms of political engagement. That finding formed the basis of our conclusion that ICT platforms have a genuine potential to flatten political access.

To assess flattening in the national experiment, we conducted a phone survey of system users. Using a call center that the research team had set up, local enumerators contacted all uSpeak users no longer than two months after they had sent a text message to their MP. The short callback survey was designed to elicit information on users' demographics, on whether they received a response from their MP, and general satisfaction with the ICT service.

<!--More information on the logistical aspects of the callback survey is in the Supplementary Materials.-->

Comparing results from our callback survey to information culled from the FFE, it is clear that the scaled-up national program failed to replicate the flattening effect identified in the FFE. Specifically, the users of the uSpeak system were wealthier, more highly educated, and overwhelmingly male, compared to those sending text-messages in the FFE. Put plainly, the uSpeak program failed to elicit participation from marginalized populations in the way political actors expected. Figure \ref{fig:AllDiff} provides information on the distribution of wealth, gender, and education, across the two field experiments.

```{r hist_fct, message=FALSE, warning=FALSE, include=FALSE}

hist_diff <- function(x = Pilot, y = uSpeak, title.x = "Users in the FFE", title.y = "Users in the uSpeak NFE", labels, c.axis1, brks,  at.axis1){
  hist(x, freq = FALSE, xaxt = 'n', main = NULL, breaks = brks)
  axis(1, at = at.axis1, lab = labels, cex.axis = c.axis1)
  title(main = title.x)
  
  hist(y, freq = FALSE, xaxt = 'n', main = NULL, breaks = brks)
  axis(1, at = at.axis1, lab = labels, cex.axis = c.axis1)
  title(main = title.y )
}

```





```{r gender_diff, message=FALSE, warning=FALSE, include=FALSE}

Pilot <- voters$sex[voters$callback==0]
uSpeak <-voters$sex[voters$callback==1]

pdf(paste0(output_folder, "/GenderDiff.pdf"), width = 6, height = 4)

  par(mfrow=c(1,2))
  
  hist_diff( x = Pilot, 
             y = uSpeak, 
             title.x = "Users in the FFE", 
             title.y = "Users in the uSpeak NFE", 
             labels  = c("Women", "Men"),
             at.axis1 = c(-.75,.75),
             c.axis1  = 1,
             brks    = -1:1)

invisible(dev.off())
  


```

```{r wealth_diff, message=FALSE, warning=FALSE, include=FALSE}

  Pilot <- voters$q41[voters$callback==0]
  uSpeak <-voters$q41[voters$callback==1]
  
pdf(paste0(output_folder,"/WealthDiff.pdf"), width = 9, height = 4)
  par(mfrow=c(1,2))
  

  hist_diff( x = Pilot, 
             y = uSpeak, 
             title.x  = "Users in the FFE", 
             title.y  = "Users in the uSpeak NFE", 
             labels   = c("Much Worse", "Worse", "Same", "Better",  "Much Better"),
             at.axis1 = .5:4.5,
             c.axis1  = 1,
             brks     = 0:5)

invisible(dev.off())
```

```{r edu_diff, message=FALSE, warning=FALSE, include=FALSE}

Pilot <- voters$Education[voters$callback==0]
uSpeak <-voters$Education[voters$callback==1]


pdf(paste0(output_folder, "/EduDiff.pdf"), width = 7, height = 4)
  par(mfrow=c(1,2))
  
  hist_diff( x = Pilot, 
           y = uSpeak, 
           title.x = "Users in the FFE", 
           title.y = "Users in the uSpeak NFE", 
           labels  = c("No schooling", "Informal", "Some primary", "Primary", 
                      "Some secondary", "Secondary", "Post-Secondary", "University"),
           at.axis1 = c(0:7) ,
           c.axis1  = .7,
           brks    = 0:7)

invisible(dev.off())

```

```{r all_diff, message=FALSE, warning=FALSE, cache=FALSE, include=FALSE}

pdf(paste0(output_folder, "/AllDiff.pdf"), width = 7, height = 7)
  par(mfrow=c(3,2))
  
  # Gender
  Pilot <- voters$sex[voters$callback==0]
  uSpeak <-voters$sex[voters$callback==1]
  
  
  hist_diff( x = Pilot, 
             y = uSpeak, 
             title.x = "Gender: FFE Users", 
             title.y = "Gender: NFE (uSpeak) Users", 
             labels  = c("Women", "Men"),
             at.axis1  = c(-.75,.75),
             c.axis1  = 1,
             brks    = -1:1)
  
  # Wealth
  
  Pilot <- voters$q41[voters$callback==0]
  uSpeak <-voters$q41[voters$callback==1]
    
  hist_diff( x = Pilot, 
             y = uSpeak, 
             title.x  = "Wealth: FFE Users", 
             title.y  = "Wealth: NFE (uSpeak) Users", 
             labels   = c("Much Worse", "Worse", "Same", "Better",  "Much Better"),
             at.axis1 = .5:4.5,
             c.axis1  = 1,
             brks     = 0:5)
  
  # Education
  
  Pilot <- voters$Education[voters$callback==0]
  uSpeak <-voters$Education[voters$callback==1]
  
  hist_diff( x = Pilot, 
             y = uSpeak, 
             title.x = "Education:  FFE Users", 
             title.y = "Education: NFE (uSpeak) Users", 
             labels  = c("No schooling", "Informal", "Some primary", "Primary", 
                        "Some secondary", "Secondary", "Post-Secondary", "University"),
             at.axis1 = c(0:7) ,
             c.axis1  = .7,
             brks    = 0:7)

invisible(dev.off())
```


\begin{figure}[h!]
\caption{{Demographic Differences: Users in the Framed Field Experiment compared to users in the Natural Field Experiment}}
\centering
\includegraphics[width=.7\linewidth]{02_figures/AllDiff.pdf}
\label{fig:AllDiff}
\caption*{\small{\textbf{Note}: Users in the scaled-up NFE were more likely to be male, better educated, and wealthier than users in the FFE. Data Sources: phone surveys of all system users.}}
\end{figure}


```{r message=FALSE, warning=FALSE, include=FALSE}

# old figure, box plot

pdf(paste0(output_folder, "/AgeDiff.pdf"), width = 6, height = 4)
  par(mfrow=c(1,1))
  boxplot(voters$q10b[voters$q10b>17]~voters$callback[voters$q10b>17], names=c("Pilot", "uSpeak"), xlab="Age")
invisible(dev.off())
        
# Another option is side by side histograms

Pilot <- voters$q10b[voters$callback==0 & voters$q10b>17]
uSpeak <-voters$q10b[voters$callback==1 & voters$q10b>17]
 
 pdf(paste0(output_folder, "/AgeDiffB.pdf"), width = 6, height = 4)
   par(mfrow=c(1,2))
  
   hist(Pilot, freq=FALSE, main=NULL)
   title(main="FFE Users")
  
    hist(uSpeak, freq=FALSE, main=NULL)
   title(main="uSpeak Users")
invisible(dev.off())
```

The patterns here suggest that usage is relatively higher among wealthier, more educated, male citizens.^[Note that flattening is a summary of heterogeneous usage. Overall low usage could (in theory) reflect a moderate usage among marginalized citizens combined with no response among the non-poor. Thus, testing for flattening is analytically distinct from exploring overall usage.]  However, the substantive importance of this is lessened by the fact that usage was exceedingly low overall. Had there been substantial flattening, this would occur only within a likely small part of a politician's information set and thus very unlikely to have had any effect on politicians' subsequent behavior.


## Insensitivity to Price

Unlike the FFE, we find no evidence of overall sensitivity to price in the scaled-up national program.^[Recall, the price effect is an experimental property; monthly prices are randomized over the study population irrespective of actual usage.] Monthly rate of messaging in the free and full-price treatment conditions are in fact almost indistinguishable. Testing for a price effect more formally, we run a linear regression of the number of messages received in a given month on price---a binary variable that takes the value of one for full price and zero for months of free messaging---controlling for the month feedback treatment indicator and MPs fixed effects. Results presented in Table \ref{tab:PriceEffect} suggests that contrary to the FFE, in the scaled national program, price did not significantly affect usage. The substantive magnitude of the estimated effect is tiny as is the upper bound of the confidence interval---at the upper bound switching to a free system is associated with less than one additional message per constituency per month (compared to a 2 percentage point increase in messaging from the FFE; a rate which would have resulted in about 30 additional monthly messages per constituency).^[The divergence observed in price effect across experiments is reminiscent of the way subjects of controlled laboratory experiments react to even small monetary manipulations that are inconsequential outside laboratory settings.]       



```{r message=FALSE, warning=FALSE, include=FALSE, echo=FALSE}


caseload$Date <- as.Date(strptime(caseload$Datestarted, "%d/%m/%Y %H:%M:%S %p"))

drange<-  seq.Date(range(caseload$Date)[1], range(caseload$Date)[2], by = "day")

g  <- function(i) length(caseload$Date[caseload$Date==i])
f  <- function(i) length(caseload$Date[caseload$Date> (i -4) & caseload$Date < (i+4)])/7
f1 <- function(i) length(caseload$Date[caseload$Date> (i -4) & caseload$Date < (i+4) & caseload$price ==1])/7
f0 <- function(i) length(caseload$Date[caseload$Date> (i -4) & caseload$Date < (i+4)& caseload$price ==0])/7
 
plot(drange, sapply(drange, g))
 mean(sapply(drange, g))
 
 plot(drange,  sapply(drange, f), type = "l")
 lines(drange, sapply(drange, f0) + sapply(drange, f1), col = "red")
 
  wb = as.Date("2012-11-01 03:34:16 EDT")
  we = as.Date("2012-11-30 03:34:16 EDT")
  
  mb = as.Date("2012-09-14 00:00:16 EDT")
  me = as.Date("2012-09-23 00:00:16 EDT")
  
```

```{r message=FALSE, warning=FALSE, include=FALSE, echo=FALSE}
 pdf(paste0(output_folder, "/ma7.pdf"), width = 10, height = 6)
   plot(as.Date(drange), sapply(drange, f), type = "l", ylab = "Daily messages (smoothed)", xlab = "time", ylim = c(0, 24))
   polygon(c(wb,wb,we,we), c(-1000,3000,3000,-1000), col = grey(.9), border=NA)
   polygon(c(mb,mb,me,me), c(-1000,3000,3000,-1000), col = grey(.9), border=NA)
   lines((drange), sapply(drange, f1), type = "l",  col="blue" )
   lines((drange), sapply(drange, f0), type = "l" , col="red" )
   abline(v=as.Date("2012-08-01 03:34:16 EDT"))
   abline(v=as.Date("2012-09-01 03:34:16 EDT"))
   abline(v=as.Date("2012-10-01 03:34:16 EDT"))
   abline(v=as.Date("2012-11-01 03:34:16 EDT"))
   abline(v=as.Date("2012-12-01 03:34:16 EDT"))
   abline(v=as.Date("2013-01-01 03:34:16 EDT"))
   abline(v=as.Date("2013-02-01 03:34:16 EDT"))
   abline(v=as.Date("2013-03-01 03:34:16 EDT"))
   lines((drange), sapply(drange, f), type = "l" , lwd=2.5 )
   text(mb-7.5, 0, "Marketing", pos=4)
   text(wb+2, 0, "Washout", pos=4)
   box()
 invisible(dev.off())
```


```{r, include = FALSE,echo=FALSE}

caseload_month$Messages <- caseload_month$MESSAGES

fixed.1 <- plm(Messages ~ price + factor(StudyMonths), data = caseload_month, index = c("code"), model = "within")
summary(fixed.1)

fixed.2 <- plm(Messages ~ price + factor(feedback) + factor(StudyMonths), data = caseload_month, index = c("code"), model = "within")

summary(fixed.2)
table(caseload_month$feedback)
```


```{r, echo = FALSE, comment = NA, results = 'asis', message=FALSE}

stargazer(fixed.1, fixed.2, dep.var.labels ="Messages per month" , title="Usage as a function of price and feedback", label = "tab:PriceEffect", omit=c("Jan", "Feb", "Sept", "Oct", "Dec"), covariate.labels = c("Price", "Education prompt", "Education plus Feedback Prompt"), header = FALSE) 
```


## No Evidence of Downstream Effects

Thus far we have shown that system usage in the scaled-up uSpeak program was low and that fully subsidizing the cost of messaging did not increase voters' proclivity to contact their MP via SMS. Notwithstanding the low rates of usage, it is possible that uSpeak had a positive effect on voters' sense of efficacy and their satisfaction with politics in Uganda. This would be the case if citizens view the existence of the ICT platforms, irrespective of one's own usage, as an important tool for strengthening citizen voice. This was a goal of the intervention and we report on it here briefly. Results in this section use experimental estimates of the effects of the intervention, exploiting the random assignment of the scaled-up program. 

To test for the effect of the national program on voters' efficacy we turn to our endline survey. The survey, which took place in July-August 2014, included $2,714$ adult respondents from 76 constituencies and 304 villages in 52 districts across Uganda.^[See Supplemntary Information, Table 3 for descriptive statistics of those survey respondents.] 

<!-- NOTE: add map of uganda at the constituency level and show where we survey  -->

To measure efficacy, we asked survey respondents whether they agree with the following statement: *People like you can do things that can have an influence on the actions of $\dots$ [your constituency MP]*; we then repeated the question for the president, district chair, and traditional leaders, which we use as placebo tests (rather than multiple tests on a common hypothesis). Our key dependent variable is a binary indicator that is equal to one for the 60% of respondents who had agreed that citizen action could influence their MP. We then run a simple OLS model regressing the efficacy outcome on a treatment indicator and district fixed effects. 

Figure \ref{fig:InfluenceEffects} shows that we did not find statistically significant evidence of downstream effects on support. The magnitude of the estimated effect is, however, reasonably large and is perhaps the most optimistic finding from the study. Exploring further however does not increase confidence in these results. In the graph we also report results from four "placebo" tests, assessing increased confidence in leaders that are not related to uSpeak. Overall the estimated effects are also large for these and indeed significant in two cases. While surprising, the pattern suggests that the intervention did not increase the efficacy of citizens with respect to MP engagement relative to the effect on engagement with other political actors.    


```{r, include = FALSE}

# Influence Variables
varlist <- c("q67_b_influe_mps_REC2", 
             "q67_a_influe_pres_REC2",
             "q67_c_influe_lc5_REC2", 
             "q67_d_influe_clan_REC2",
             "q67_e_influe_kingdom_REC2")

# estimate the effects
models <- lapply(varlist, function(x) 
  lm(substitute(i ~ uspeak + as.factor(district), list(i = as.name(x))), weight=IPW , data = citizens)
)
                        
table(citizens$uspeak)

################################################################################
# Influence
################################################################################
influence <- c("MP", "President", "District Chair", "Clan Leader", "Kingdom Leader")


# create a data.frame to store yindex, point estimates, upper bound, lower bound 
ci <- sapply(1:length(models), function(i) confint(models[[i]], 'uspeakTreat', level = 0.95))

A <-  data.frame(degree_influence = 1:5,
                 effect   = sapply(1:5, function(i) models[[i]]$coef['uspeakTreat']),
                 upper_ci = ci[1,],
                 lower_ci = ci[2,])%>%
          mutate(degree_influence = factor(degree_influence, labels = influence), 
                 degree_influence = factor(degree_influence, levels = rev(levels(degree_influence))))



pdf(file = paste0(output_folder, "/InfluenceEffects.pdf"), width=8, height=5) 
  
  pd <- position_dodge(width=0.5)
  d <- ggplot(data = A, aes(x = factor(degree_influence), y = effect))  
  d + geom_point(colour = "black", size = 4) +  theme_bw() + scale_shape_manual(values=c(22,22))  + 
    theme(panel.border = element_blank(), panel.grid.major.y = element_line(),
    panel.grid.major.x = element_blank(), panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "black"),plot.title = element_text(size = 14, face = "bold", vjust = 1.5),
    axis.title.y = element_text(colour="grey20",size=12,face="bold"),
    axis.text.x = element_text(colour="grey20",size=11, face="bold"),
    axis.text.y = element_text(colour="grey20",size=11,face="bold"),  
    axis.title.x = element_text(colour="grey20",size=11,face="bold")) + 
    scale_y_continuous(name="Effect of uSpeak",  limits=c(-.35,.35)) + 
    scale_x_discrete(name="Ability to Influence") +
    geom_errorbar(aes(ymin=lower_ci, ymax=upper_ci), width=0, size=0.6, position=pd) + coord_flip() +
    geom_hline(yintercept = 0, colour="grey", linetype = "longdash", alpha=.4)

dev.off()

########################################################################################
# Hetero effects 
# approach to estimatIng the effects on influence conditional on hearing about uSpeak
########################################################################################


varlist <- c("q67_b_influe_mps_REC2", 
             "q67_a_influe_pres_REC2",
             "q67_c_influe_lc5_REC2",
             "q67_d_influe_clan_REC2",
             "q67_e_influe_kingdom_REC2")

inter.models <- lapply(varlist,  function(x){
 lm(substitute(i ~ uspeak*q149_know_uspeak_REC2 + uspeak + q149_know_uspeak_REC2 + as.factor(district), list(i = as.name(x))), weight = IPW, data = citizens)})

ci <- sapply(1:5, function(i) confint(inter.models[[i]], 'uspeakTreat:q149_know_uspeak_REC2', level=0.95))


A <-  data.frame(degree_influence = 1:5,
                 effect   = sapply(1:5, function(i) inter.models[[i]]$coef['uspeakTreat:q149_know_uspeak_REC2']),
                 upper_ci = ci[1,],
                 lower_ci = ci[2,])%>%
          mutate(degree_influence = factor(degree_influence, labels = influence), 
                 degree_influence = factor(degree_influence, levels = rev(levels(degree_influence))))




pdf(file=paste0(output_folder, "/InfluenceEffects_HETERO.pdf"), width=8, height=6) 
  
  pd <- position_dodge(width=0.5)
  d <- ggplot(data = A, aes(x = factor(degree_influence), y = effect))
  d + geom_point(colour = "black", size = 3) +  theme_bw() + scale_shape_manual(values=c(22,22))  + 
    theme(panel.border = element_blank(), panel.grid.major.y = element_line(),
          panel.grid.major.x = element_blank(), panel.grid.minor = element_blank(),
          axis.line = element_line(colour = "black"),plot.title = element_text(size = 14, face = "bold", vjust = 1.5),
          axis.title.y = element_text(colour="grey20",size=12,face="bold"),
          axis.text.x = element_text(colour="grey20",size=11, face="bold"),
          axis.text.y = element_text(colour="grey20",size=11,face="bold"),  
          axis.title.x = element_text(colour="grey20",size=11,face="bold")) + 
    scale_y_continuous(name="Effect of uSpeak") +
    scale_x_discrete(name="Ability to Influence") +
    geom_errorbar(aes(ymin=lower_ci, ymax=upper_ci), width=0, size=0.6, position=pd) + coord_flip() +
    geom_hline(yintercept = 0, colour="grey", linetype = "longdash", alpha=.4)

invisible(dev.off())

```

\begin{figure}[h!]
\caption{{Efficacy Effects}}
\centering
\includegraphics[width=.75\linewidth]{02_figures/InfluenceEffects.pdf}
\label{fig:InfluenceEffects}
\caption*{\small{\textbf{Note}: The marginal effect of uSpeak on political efficacy measured as respondents' perception of their ability to impact their MP.}}
\end{figure}

\section{Discussion}\label{S5}

Experimental findings from the national program conflict with the results from the FFE. Notably, uSpeak resulted in low usage, even when the service was offered to voters at no cost. Moreover, confirming concerns that ICTs would exacerbate existing inequalities in political access, when uSpeak was used, it was by and large used by citizens whose voice is already more likely to be heard. In other words, the groups that have the weakest access to political processes were also the least likely use the new ICT platform. 

Here we explore some of the reasons that may account for the low usage of the uSpeak system. In Supplementary Information (section 8) we also assess several explanations for the fact that---contrary to the FFE---marginalized populations were significantly less likely to use the new ICT platform. This question is of somewhat less importance because of the low usage of the uSpeak system---with low take up, any potential flattening is only with respect to a less important information source, as mentioned above.^[We thank a reviewer for pointing out the implications of low overall usage on the political salience of results on flattening.] For similar reasons we do not explore the reasons behind the lack of downstream effects, since the fact that the scaled-up national program generated such weak first-stage results makes it less surprising that voters and politicians' attitudes and behavior were not affected by the introduction of uSpeak. 

Our goal in closely examining the causes of the low usage in the NFE is not merely to account for these diverging results, but rather to use the analysis to derive substantive insights regarding the role ICTs can currently play in improving political communication in low-income countries. Although the FFE led by the research team was meant to capture the key features of the scaled-up national ICT platform, the introduction of relatively tight experimental control introduces a number of differences. 

We first explore whether differences in results could be due simply to differences in samples, exploiting the fact that the NFE constituencies are a subset of the FFE constituencies. We next explore the explanatory power of two external features of the NFE---which are common to interventions that are scaled-up from controlled pilots to larger-scale programs---that may have been consequential. We refer to these as "scale" and "agent" effects. In addition, we examine the implications of subtle differences in the delivery of the treatment. These "design" effects may be especially relevant for interventions that involve the dissemination of information to subjects. 

Changes in *scale* are often described as a problem of general equilibrium effects [@deaton2010instruments]. This concern is of particular salience when treatment effects are sensitive to the share of treated in the population. Scale effects are of special concern when subjects can accurately infer the magnitude of a program from its delivery method, as is clearly the case in our study. In our setting, it is quite possible that collective action problems get altered substantially as scale increases. Insofar as political communications complement each other, or substitute for each other, increases in scale could lead to greater or lower overall levels of communication. 

A third possible reason for the low usage relates to *agents*. Whereas the research team implemented the FFE, the Parliament of Uganda and NDI led the scaled-up national program.^[Differences in agents across scales are common: for example, the Millennium Villages initiative sought to assess the scope for government led development change by examining an intervention in which government was not a primary actor.] In our case, this change in agents might have affected citizen expectations regarding the responsiveness to their messages. In other words, the fact that the scaled-up national intervention was implemented by Parliament rather than by researchers may have reduced the incentives of the target populations to engage.

The fourth possibility relates to *experimental design* and specifically, to the possibility that details of the mode of treatment delivery---the nuts and bolts of executing field experiments---mattered a great deal for citizens in deciding whether or not to communicate with elites. We focus here on two possibilities. The first is that the method of delivery (radio spots) introduced a *treatment compliance effect*: that Ugandans were simply unlikely to hear or internalize appeals issued through mass media, and not less likely to respond, conditional on hearing. This mode of delivery differs from the FFE where the enumerators ensured that respondents unambiguously heard and internalized the information on the SMS platform.

Closely related is the possibility that different methods of delivery have varying degree of an (implicit) *invitational effect*. It is possible that communication was relatively high in the FFE not simply because the in-person survey context ensured awareness of the new ICT platform, but also because the enumerators had personally invited respondents to contact their MP. Communicated in the context of a survey, such invitation may appear as a more personal encouragement to engage in politics. A direct personal invitation has an empowering effect, signaling receptiveness and the possibility that political communication will make a difference. These last two `design' mechanisms are closely related yet distinct: one is about whether an invitation was empowering and deemed personal, the other is whether an invitation gets heard at all.

We use a number of strategies to adjudicate between these five explanations. First, we exploit a feature of the scaled-up field experiment in which there was variation in the *feedback* provided to voters on the behavior of others. This allows us to examine whether exacerbating collective action problems due to scale can, at least partially, account for the significantly lower usage Second, we conducted a citizen endline survey with a nationally representative sample, which allows us to assess---albeit with some lag---ex-post differences in treatment compliance. Third, we hired a Ugandan private marketing firm to examine whether the radio stations NDI had contracted indeed played the ads according to the experimental design. Fourth, we implemented an additional "mechanism experiment" in one district in Uganda in which we specifically varied the invitational component. Table \ref{PilottoFull} summarizes the list of potential explanations, and the source that was used to explore their explanatory power.  

\begin{table}[ht]
\centering
\caption{{Differences between the FFE and the NFE}}
\label{PilottoFull}
\begin{tabular}{c|l|l}
\textbf{} & \textbf{Type}                         & \textbf{Data Source} \\
\hline
1  & Sample Differences                          & FFE data (citation ommited) \\
  & (Randomization bias)                          & \\
\hline
2  & Scale Effects                          & National experiment \\
  &                           &  (Feedback treatment) \\
\hline
3  & Agent Effects                         & Users survey data (callback)                        \\
\hline
4 & Design Effects I: Treatment Compliance & Citizens survey data (national sample),  \\
   &   & Radio monitoring data \\
\hline
5 & Design Effects II: Invitational Effects & Follow-up Mechanism experiment \\      
\end{tabular}
\end{table}
  



**Randomization Bias**

One concern is that differences arise from differences in the study population and the target population.^[We thank a reviewer for highlighting this point of difference.] This is a form of what @heckman1991randomization has called "randomization bias" in the sense, here, that control of the research team over experimentation (or sample) selection ensured that all constituencies took part in the FFE whereas only one third of constituencies took part in the NFE.

Selection into experimentation can mean that researchers do not have information on how non-subjects would perform in target contexts of interest. In our study,  the selection took place for the scaled-up experiment (the target) and not in the controlled experiment. This is different to the usual situation where the bias is in the controlled setting rather than in the target case. It  means however that we are in a position to assess to some extent how great randomization bias in the *scale up* is. To explore this, we return to data on the three indices from the FFE: access, political engagement, and marginalization. For each of these we compare mean values for the (self-selected) NFE sample, the uSpeak constituencies (randomly sampled from the NFE constituencies), and the endline constituencies  (randomly sampled from the NFE constituencies) against the FFE sample. None of these differences are important, suggesting that the first comparison---between the FFE and NFE constituencies---is most important. We find here that on these key features there is no significant difference between the FFE and NFE constituencies. Most importantly the rates of sending messages in the FFE was essentially identical and so we conclude that this difference in sample is unlikely to explain differences in behavior in the NFE.      

```{r, echo=FALSE}

# clean variable to get constituencies (eacode) in FFE
pcode       <- as.character(ffe$DICT_PCODE2011)
pcode_clean <- as.numeric(substr(pcode, 1, nchar(pcode)-1))

#eacodes in the endline
eacode_endline <- unique(citizens$eacode)

# create marginalized index based on APSR rep code           
index <- ffe %>% mutate(MARG = ffe$MARG,                     # Review -- is this needed?
                        # MARG_RANK = 25  + 50*ffe$MARG,     # Review -- but this is binary??
                        eacode = pcode_clean)

# merge MP data by eacode to get uspeak assignment
mp_const <- unique(MP[,c("selctedforpcs","constituency","eacode")])
mp_const <- mp_const %>% 
  mutate(correct = case_when(constituency ==  "DODOTH COUNTY EAST" ~ "DODOTH EAST COUNTY",
                             constituency ==  "KIBOGA COUNTY EAST" ~ "KIBOGA EAST COUNTY",
                             constituency ==  "BUYAGA COUNTY WEST" ~ "BUYAGA WEST COUNTY",
                             constituency ==  "BUYAGA COUNTY EAST" ~ "BUYAGA EAST COUNTY",
                             constituency ==  "BUNYOLE COUNTY WEST" ~ "BUNYOLE WEST COUNTY",
                             constituency ==  "BUDADIRI COUNTY WEST" ~ "BUDADIRI EAST WEST",
                             constituency ==  "KALUNGU COUNTY EAST" ~ "KALUNGU EAST COUNTY",
                             constituency ==  "BUNYOLE COUNTY EAST" ~ "BUNYOLE EAST COUNTY",
                             constituency ==  "NAKASEKE COUNTY SOUTH" ~ "NAKASEKE SOUTH COUNTY"))
mp_const$constituency[!is.na(mp_const$correct)] <- mp_const$correct[!is.na(mp_const$correct)]

index %<>% left_join(., mp_const, by = "eacode")

#merge endline sample data with constituency sampling probabilities
endline_sample %<>% mutate(constituency = as.character(constituency))
index <- left_join(index, endline_sample[, c("srl", "constituency", "sample_probs", "uspeak_sort")], by = "constituency")

# create sample dummies
index %<>% mutate(
  #FFE sample
  ffe = 1,
  #NFE sample from MP `selctedforpcs`
  nfe = 1*(!is.na(selctedforpcs)),
  #NFE sample from endline_sample `uspeak_sort`
  # nfe2 = 1*(uspeak_sort %in% c(0,1)),
  #uSpeak from MP `selctedforpcs`
  uspeak = 1*(selctedforpcs == 1),
  uspeak = ifelse(is.na(uspeak), 0, uspeak),
  #uSpeak from endline_sample `uspeak_sort`
  # uspeak2 = 1*(uspeak_sort == 1),
  #endline
  endline = 1*(index$eacode %in% eacode_endline)
  )

```

```{r, echo=FALSE, warning=FALSE, message=FALSE}
# Construct Table of Indices by Sample
# In the code below, we calculate the average indices of the different samples and estimate p-value from F-test of the null that the mean index for each of the sample subsets is equal the mean index of the entire FFE sample (i.e., that the coefficient of `lm_robust` regression of index on sample indicator equals 0). Regression clusters standard errors at constituency level.

# create inverse propensity weights for endline sample data
index$wt <- 1/index$sample_probs

outcomes <- c("ACCESS", "ENGAGED", "MARG", "SMS")
ffe_mod <- lapply(outcomes, function(outcome){
  fy <- as.formula(paste0(outcome, " ~ ", "1"))
  tidy(lm_robust(fy, data = index, clusters = eacode))
})

index_means <- function(outcome, sample){
  fy <- as.formula(paste0(outcome, " ~ ", sample))
  if(sample == "endline"){
    index$wt[index$sample_probs == 0] <- NA
    ret <- tidy(lm_robust(fy, data = index, clusters = eacode, weights = wt))
  } else ret <- tidy(lm_robust(fy, data = index, clusters = eacode))
  ret <- ret[ret$term == sample,]
  ret$mean <- mean(index[index[[sample]] == 1, outcome], na.rm = TRUE)
  ret[, c("term", "outcome", "mean", "p.value")]
}
```

```{r, echo=FALSE, results="asis", warning=FALSE, message=FALSE}
grid <- expand.grid(outcome = outcomes, sample = c("nfe", "uspeak", "endline"), stringsAsFactors = FALSE)
index_dat <- mapply(index_means, outcome = grid[,1], sample = grid[,2]) %>% t %>% data.frame
rounding <- 3
index_tab <- data.frame(Index = c("Access", "Political Engagement", "Marginalization", "Sent SMS"),
                        FFE = round(unlist(lapply(1:4, function(x) 
                          ffe_mod[[x]]$estimate)), rounding),
                        NFE = round(unlist(lapply(1:4, function(x) 
                          index_dat[x,3][[1]])), rounding),
                        uSpeak = round(unlist(lapply(5:8, function(x) 
                          index_dat[x,3][[1]])), rounding),
                        Endline = round(unlist(lapply(9:12, function(x) 
                          index_dat[x,3][[1]])), rounding))

index_dat <- mutate(index_dat, stars = 
                      case_when(p.value >= 0.05 ~ "",
                       p.value < 0.05 & p.value >= 0.01 ~ "*",
                       p.value < 0.01 & p.value >= 0.001 ~ "**",
                       p.value < 0.001 ~ "***"))

# Check stars and add on if required
# index_dat$stars
index_tab <- cbind(
  as.character(index_tab[,1]),
  matrix(paste0(as.matrix(index_tab[,-1]), as.vector(index_dat$stars)), 4, 4))

# Add N
index_tab <- rbind(index_tab,
                   c(Index = "N", sapply(c("ffe", unique(grid[,2])), function(x) sum(index[[x]] == 1, na.rm = TRUE))))
# index_tab$Index[5] <- "N"

textab <- capture.output(print(xtable(
  index_tab, label = "tab:indices_samples",
  caption = "Mean indices across experimental samples"),
  caption.placement = "top", comment = FALSE, include.rownames = FALSE))

textab[12] <- paste0(textab[12], " \\hline")
# textab[13] <- paste0("N", textab[12])
textab[16] <- "\\begin{flushleft} Note: Access and Political Engagement indices are standardized (mean equals 0 and standard deviation equals 1). Marginalization index is based on percentile. Columns (2)-(4) indicate sample averages and p-values of null of no difference from out-of-sample means. $^*$ $p<0.05$; $^{**}$ $p<0.01$; $^{***}$ $p<0.001$.\\end{flushleft}"

textab[17] <- "\\end{table}"

cat(textab)

## check `uspeak_sort` against `selctedforpcs`
# table(index$uspeak_sort, index$selctedforpcs, useNA = "a")
```


**Scale Effects**

Political communication is subject to collective action problems. If many others are visibly lobbying a politician, free-ridership may become more likely if messages work as substitutes. By contrast, visibility of lobbying can improve voters' coordination leading to a cascade of usage, assuming a sufficient number of voters view messages as complements, as in @ferrali2018peer. 

We can imagine two ways in which a logic of this form can play out. First, if politicians are more informed about others, and better able to target resources to them, this can increase incentives to provide a politician with information on one's own preferences. Second, scale may also result in lower individual contributions through a simple logic of substitution for members of a given group. In the extreme case in which information from constituency members were perfect substitutes and citizens faced linear costs, increases in potential information providers would not alter the amount of information provided, in equilibrium, which in turn implies a corresponding reduction in per capital information provision.   

Conversely, one could also construct examples in which when many others are lobbying for a common good there may be increasing returns to lobbying (or here too, there may also be increasing incentives to free ride). In short, if the incentives to use technological innovations for political communication depend on the perception of how others are engaging [@ferrali2018peer], then outcomes at a small scale may look very different to outcomes at a large scale. From this perspective, weaker participation from the scaled-up program may reflect a simple failure of collective action. 

In order to understand if changes in scale induced free-riding, we look for differences in outcomes due to our *feedback* treatment. Recall that in the scaled up national experiment, we exogenously varied the information constituents received about the level of activity by other voters in previous periods. In particular, a random subset of constituencies was informed, through the short radio ads, that other voters had been using the system to mainly raise issues around education. Under a free-riding logic, such information would depress engagement among those exposed to it.

Returning to Table \ref{tab:PriceEffect}, however, we find no evidence of sensitivity of engagement to information on usage by others---neither the difference between feedback on education messages and standard marketing spots, nor differences between education marketing with and without feedback is significant. This is consistent with a set of analyses we conducted on the data from the FFE in which we found no evidence for strategic engagement with the system. We conclude that scale, by itself, does not seem to be a key factor driving our divergent results in terms of overall citizen communication.  

**Agent Effects**

Another possible reason for the low system usage witnessed in the scaled-up national program is *agent effects*. Citizens' usage of mobile messaging plausibly increases with the belief that there is a receptive representative at the other side of the interaction [@sjoberg2017effect]. Citizens will be less likely to communicate if they think that a political actor cares less about their interests. More subtly, they will also be less likely to participate if political actors are better informed about the interests of rival constituents. 

Which system should voters expect to produce greater responsiveness by politicians? On one hand, unlike our FFE, the scaled up national program is formally owned and led by Parliament, which signals some level of commitment by politicians. In addition, the dynamic nature of the scaled-up program---i.e., the ability of MPs to interact with citizens directly via the ICT platform---further allows MPs to signal their responsiveness directly. This sort of dynamic reciprocal relationship could not have been established in the `one-shot' controlled FFE. On the other hand, in the scaled-up program, the communication between citizens and politicians was direct, whereas in the FFE this relationship was mediated by the research team, which was responsible for delivering the messages to survey respondents' respective MPs. Citizens may believe that their MP will take their messages more seriously if researchers or an NGO mediates the relationship between voters and representatives; for example, if it follows up in case some messages get ignored. Thus, it is hard to predict *a priori* how the change in the implementer's identity would affect citizen communication. We explore (non-experimentally) agent effects in two ways.

```{r, include=FALSE, echo=FALSE}
names(callback)
table(callback$Received_Response)

shareresp <- callback %>%  group_by(Name_of_MP) %>% 
        dplyr::summarize(mean=mean(Received_Response, na.rm=T),
                  sd=sd(Received_Response, na.rm=T), N = n(), se = sd / sqrt(N),
                  ciMult= qt(0.95/2 + .5, N-1), ci= se * ciMult, ymin=mean-ci, ymax=mean+ci, Y="Response Rate")

hist(shareresp$mean)

noresponse <- length(which(shareresp$mean == 0))


(received_response <- floor(mean(callback$Received_Response, na.rm = TRUE)*100))
```

First, we use the callback survey ($n=$ `r nrow(callback)` uSpeak users) to calculate MPs' response rate at the constituency level and then test for a correlation between MP's responsiveness and the volume of messaging at the constituency level. We find that only `r received_response` percent of uSpeak users report ever hearing back from their MP; in fact, in almost half of the treated constituencies (`r noresponse`) *not a single uSpeak user had received any response from their MP*. Moreover, analyzing system login information, we find that the majority of MPs did not read many (or any) of the messages sent to them. As expected, we find a positive correlation between messaging and responsiveness, which is consistent with citizens' low engagement being a rational response to their MP's (in) action during the scaled-up study period.        

Second, though the callback survey analysis focuses on system users---a self-selected group---we, nonetheless, can also assess whether broader expectations regarding MP inaction may have contributed to the low usage rate among the general population. Here we examine responses in the citizen endline survey, when our national representative sample was asked to indicate reasons for why people might not use SMS platforms such as uSpeak to communicate with their MPs. APPENDIX FIGURE (top left) provides information on the share of respondents in treatment constituencies that indicate each possible reason. Tellingly, we find that close to 50 percent of respondents, report that they would not send a message because they do not expect their MP to be responsive, and about a quarter report a reluctance to contact their MP via text-messaging out of fear of bad repercussions. 

We do not have information on the expectations of responsiveness from MPs in the (one-shot) FFE and so cannot compare those directly. Nevertheless, the statements by citizens and the very weak responsiveness by politicians suggests that the low engagement with the scaled up program was a rational response on the part of citizens. 

```{r, include=FALSE}
## Summarizes data.
## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
##   data: a data frame.
##   measurevar: the name of a column that contains the variable to be summariezed
##   groupvars: a vector containing names of columns that contain grouping variables
##   na.rm: a boolean that indicates whether to ignore NA's
##   conf.interval: the percent range of the confidence interval (default is 95%)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
  
  # New version of length which can handle NA's: if na.rm==T, don't count them
  length2 <- function (x, na.rm=FALSE) {
    if (na.rm) sum(!is.na(x))
    else       length(x)
  }
  
  # This does the summary. For each group's data frame, return a vector with
  # N, mean, and sd
  datac <- plyr::ddply(data, groupvars, .drop=.drop,
                 .fun = function(xx, col) {
                   c(N    = length2(xx[[col]], na.rm = na.rm),
                     mean = mean   (xx[[col]], na.rm = na.rm),
                     sd   = sd     (xx[[col]], na.rm = na.rm)
                   )
                 },
                 measurevar
  )
  # Rename the "mean" column    
  datac <- plyr::rename(datac, c("mean" = measurevar))
  
  datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean
  
  # Confidence interval multiplier for standard error
  # Calculate t-statistic for confidence interval: 
  # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
  ciMult <- qt(conf.interval/2 + .5, datac$N-1)
  datac$ci <- datac$se * ciMult
  
  return(datac)
}

############################################
# Clean data
############################################
table(citizens$uspeak)

citizens$gender <- factor(citizens$gender, labels=c("Male","Female"))
table(citizens$gender)

############################################
# heard of uSpeak by gender and education
############################################
setnames(citizens, "gender", "Sex")
setnames(citizens, "q149_know_uspeak_REC2", "q149N")
citizens$Education <-  citizens$q37_educ_REC2
citizens$Education[citizens$Education==5] <-"4"
citizens$Education <- factor(citizens$Education, labels=c("No Edu", "Some Primary", "Primary", "Secondary", "Post-Sec"))

table(citizens$Education)

knowD  <- dplyr::select(citizens, "q149N","Education","Sex", "uspeak")
knowDT <- knowD[knowD$uspeak =="Treat",]

# know uspeak by gender, entire sample 
(summarySE(knowD, measurevar="q149N", groupvars="Sex"))
# know uspeak by gender, only treatment
(summarySE(knowDT, measurevar="q149N", groupvars="Sex"))

# know uspeak by treatment status, entire sample 
(summarySE(knowD, measurevar="q149N", groupvars="uspeak"))


# know uspeak by gender and education, only treatment
(tgc <- summarySE(knowDT, measurevar="q149N", groupvars=c("Education","Sex")) %>%
        dplyr::mutate(lower_ci = q149N - ci,
               upper_ci = q149N + ci))


pd <- position_dodge(width=0.45)
pdf(file= paste0(output_folder, "/HearduSpeak.pdf"), width=7, height=4) 

ggplot(tgc, aes(x=Education, y=q149N, color=Sex)) +geom_point(size = 3.5, alpha=.7, position=pd) +
  scale_size_area() + scale_colour_brewer(palette="Set1")+  ggtitle("Ever Heard of uSpeak") +
  scale_y_continuous(name="Share that know uSpeak",  breaks = round(seq(min(0), max(0.8), by = 0.1),1), limits = c(0, 0.8))  +
  scale_x_discrete(name="") + theme_bw() + geom_errorbar(aes(ymin=lower_ci, ymax=upper_ci), width=0, size=0.6, position=pd) +
  theme(panel.border = element_blank(), panel.grid.major.y = element_line(), panel.grid.major.x = element_blank(), 
         panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), 
         plot.title = element_text(size = 14, face = "bold", vjust = 1.5),
         axis.title.y = element_text(colour="grey20",size=12,face="bold"),
         axis.text.x = element_text(colour="grey20",size=11, face="bold"),
         axis.text.y = element_text(colour="grey20",size=11,face="bold"),  
         axis.title.x = element_text(colour="grey20",size=11,face="bold"))
dev.off()


# in-text stats


(heard_control <- (with(citizens, 
                       round( mean(q149N[uspeak_sort == 0 ] )*100) ))) 

(heard_treat <- (with(citizens, 
                       round( mean(q149N[uspeak_sort == 1 ] )*100) ))) 
```

**Treatment Compliance**

It is possible that the difference between the NFE and the FFE is simply due to insufficiently strong first-stage; i.e., that the radio ads had a limited reach and thus an overwhelming majority of constituents never heard of the uSpeak program. To test for *compliance* effects we asked respondents in our citizen endline survey directly whether they have ever heard about uSpeak. The survey was implemented in 50 parliamentary constituencies approximately one year after the six-months radio campaign, though at a time when the uSpeak system was still active and promoted by Parliament. In light of the time gap, we used a deliberately strong prime, which entailed playing the original radio ad and asking respondents if they have heard of the service (uSpeak) the ad sought to promote. 

Starting with the raw data, we find that about `r heard_control`% of respondents in control constituencies and `r heard_treat`% of respondents in treatment areas self-report that they ever heard of uSpeak.^[We note that the reported rate of "compliance" likely is an upper bound due to the possibility of social desirability bias.] Note that control respondents are not necessarily misrepresenting  their knowledge of the program; this is because radio signals normally have a range that encompass more than a single parliamentary constituency.^[Hearing the message in control areas does not imply non-compliance since the ads were tailored to employ the name of treated MPs only. Control subjects could be aware that others were treated but this does not make it possible for them to take-up the treatment.] Testing for a first-stage more formally, we take a conservative approach: first, we code respondents that live in constituency $j$ as treated if either their constituency (usually male) MP or their district (women) MP were assigned to the uSpeak program, and then calculate the share of the constituency hearing about uSpeak adjusting using survey sample weights. Regressions results at the constituency level, using inverse propensity weights based on both constituency and district assignment probabilities, are reported in Table \ref{tab:FirstStage}. We find a large, positive and significant first-stage (column 1). This result is robust to whether or not we control for (aggregated) individual-level covariates (column 2) and whether or not we add fixed effects for the randomization stratification blocks (columns 3-4).     


```{r compliance_prep, include = F}
  

  #aggregate income, education, and knowledge of uSpeak vars to the constituency level using citizens sampling weights
  
  citizens_agg <- citizens %>%
  dplyr::group_by(eacode) %>%    
     dplyr::mutate(heard = weighted.mean(q149N, 1/sample_prob, na.rm = T),
                    educ = weighted.mean(q37_educ_REC2, 1/sample_prob, na.rm = T),
                    inc = weighted.mean(q43_avg_income_REC1, 1/sample_prob, na.rm = T)) %>%
      dplyr::select(EA_districtcode = C_dist_codeec2011 , eacode, C_PROB, D_PROB, region, heard, inc, educ) %>%
           # aggregate knowledge of uSpeak at district and constituency level
      unique()

    
  #total messages per MP
  sms_agg <- sms %>%
    dplyr::group_by(eacode) %>%
    dplyr::summarize(sms.mp = n()) %>%
    drop_na(sms.mp)

 #aggregate reg voters var to constituency and district
 #aggregate reg results var to constituency and district
  results_agg <- results %>%
    dplyr::group_by(electoral_code, district_code11) %>%
    dplyr::summarize(regvoters.const = sum(Registered.Voters, na.rm=T)) %>% 
    dplyr::rename(eacode = electoral_code, EA_districtcode = district_code11) %>%
    unique()
  
  cmp <- MP %>% dplyr::rename(EA_districtcode = district_code, Name = MP_name, uspeak = selctedforpcs) %>%
    subset(uspeak %in% c(0,1)) %>%
    subset(mptype == "CMP") %>%     #keep constituency MPs only
    dplyr::select(id, s_code, Name, eacode, constituency, district, EA_districtcode, bin, uspeak, mptype) %>%
    unique()
  
  
  
  treated_dist <- unique(MP$district_code[MP$selctedforpcs==1]) 
  
  #merge endline and sms data to mp data (by MP type)
  dat <- left_join(cmp, sms_agg, by = "eacode") %>%
         left_join(citizens_agg[,!names(citizens_agg) %in% "EA_districtcode"], by = "eacode") %>%
         mutate( uspeak_any = ifelse(EA_districtcode %in% treated_dist , 1, uspeak )) %>%
         left_join( results_agg[,c("eacode", "regvoters.const")], by = "eacode") %>%
         #sms.mp: if missing messages data, 0 messages were received
         #uptake: create measure (#sms/#reg voters for each constituency) 
         #IPW_any: create inverse propensity weights - C_PROB is probability of 
         #constituency assignment into treatment
         mutate( sms.mp  =  ifelse(is.na(sms.mp), 0 , sms.mp), 
                 uptake  =  sms.mp/regvoters.const,
                 IPW_any = ifelse(uspeak == 1, 1/(1-((1-C_PROB)*(1-D_PROB))), 1/((1-C_PROB)*(1-D_PROB))),
                 IPW     = ifelse(uspeak==1, 1/C_PROB, 1/(1-C_PROB))) 
         
         
        
  
 
  

  
  #check
  sum(is.na(dat$uspeak))
  sum(is.na(dat$bin))
  sum(is.na(dat$uptake))
  sum(is.na(dat$IPW))
  sum(is.na(dat$heard))
  
```




```{r compliance, include=F}

# treatment defined by constituency and district MP assignment
dat_robust <- dplyr::select(dat, uptake, heard, bin, uspeak_any, IPW_any, educ , inc) %>% 
          dplyr::filter(complete.cases(.))


## Group bin 14 and AUTO together
table(dat_robust$bin)
# bin = 14 and bin = auto have only one obs

dat_robust$bin[dat_robust$bin == "AUTO"] <- 14
table(dat_robust$bin)


formulae <-  c("heard ~ uspeak_any",
               "heard ~ uspeak_any + educ + inc",
               "heard ~ uspeak_any + bin",
               "heard ~ uspeak_any + educ + inc + bin")
terms    <-  c("uspeak_any", "educ", "inc", "(Intercept)")


lm_out <- sapply(formulae, function(x)
            lm_robust(formula = as.formula(x), data = dat_robust, weights = IPW_any, se_type = "stata"))


out <- sapply(1:4, function(i) {
  tidy_out  <-   tidy(lm_out[[i]])
  index <- which(tidy_out$term %in% terms) 
  tidy_out <- tidy_out[index, ]
  
  stars <- sapply(tidy_out[ ,"p.value"] , 
                  function(p) {ifelse(p <= .01, "***", 
                                      ifelse(p<= .05, "**", 
                                             ifelse(p<= .1, "*", "")))})
  
  a <-  rbind( paste0( round(tidy_out[ ,"estimate"],3),  stars),  
               paste0("(", round(tidy_out[,"std.error"],3) , ")"))
  
  
  colnames(a) <- tidy_out[,1]
  
  a})



tab <- do.call(rbind, lapply(terms, function(x)
  sapply(1:4, function(i) {
    if(x  %in% colnames(out[[i]]) ) c (out[[i]][,x], "" )
    else rep("",3)} )) )


tab  <- cbind (c("Treatment","","", "Education","","",  "Income","" ,"","Constant","","", "Block FEs", "N"),
               rbind(tab, 
                     c("no", "no", "yes", "yes"),
                     sapply(1:4, function(i) lm_out[[i]]$N)))



```



```{r compliance_print1, include=FALSE, echo=FALSE, message=FALSE}


rownames(tab) <- c("Treatment","","", "Education","","",  "Income","" ,"","Constant","Constant se","", "Block FEs", "N")
tab[c("Constant", "Constant se"),4:5] <- ""


xtab <- xtable(tab, caption ="First Stage for Constituency MPs only", align = c("l","l", rep("c",4)), label = "tab:FirstStage")
addtorow <- list()
addtorow$pos <- list(0, 14)
addtorow$command <- c(paste0("&\\multicolumn{4}{c}{\\textit{Dependent variable:}} \\\\\n", 
                             "\\cline{2-5} \\\\\n", 
                             " &\\multicolumn{4}{c}{Knowledge of uSpeak}  \\\\\n",
                             "  &(1) & (2) & (3) & (4)  \\\\\n"), 
                     "\\hline \\hline
               \\textit{Note:} 
              & \\multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\\\\n"
)




fileConn<-file(paste0(output_folder, "/tab_compliance1.tex"))
writeLines( 
  print.xtable(xtab,
               add.to.row = addtorow, 
               include.rownames = FALSE, 
               include.colnames = FALSE,
               comment = FALSE,
               booktabs = TRUE,
               hline.after=c( -1, -1 ,0, 12),
               table.placement = "!htbp",
               caption.placement = "top"), 
  fileConn)
close(fileConn)

    
```


\input{"02_figures/tab_compliance1.tex"}


Second, we estimate constituency-level SMS sending rates (take-up) as a function of hearing about uSpeak using two stage least square regressions. This analysis allows us to compare directly the take-up rate conditional on hearing about the program in the NFE against those of the FFE; it also serves as a reality check for the first-stage regression since the dependent variable here is a behavioral measure derived from the uSpeak database. As reported in Table \ref{tab:SeconStage}, hearing about uSpeak has a take-up rate of 0.002 with standard error of 0.001; in other words, the take-up rate is between 1/4 and 1/10 of 1 percent. Comparing this to a take-up rate of about 5% in the FFE (which is larger by a factor of about 25), we can confidently reject the null of no difference between NFE take-up among compliers and our estimated FFE take-up. 

This analysis has two implications. On the one hand, the effectiveness of radio as a marketing device is not strong. Indeed, when probing deeper about respondents' knowledge of the uSpeak program we find that only 6 percent of treatment respondents were able to confidently say that their MP had participated in the program. Moreover, when asked to repeat the four-numbers short-code, less than half a percent of treated constituencies claimed to know the short-code to send a text-message to their MP and an additional 3% report they once knew the number but have since forgotten it. These findings strongly point to the limitation of radio marketing to garner sufficient awareness to the new service. 

On the other hand, this is clearly not the full story. Take-up differential cannot be fully explained simply as a function of a weaker first-stage in the NFE since the two stage analysis suggests that the effects on those that do get the message are much weaker than in the FFE.^[A back of the envelope calculation suggests that if messaging got through to 10% of up to 10 million subjects and had 5% of these responded, there would have been 50,000 messages entering the system.]  

```{r iv_compliance, include=F}

formulae <-  c("uptake ~ heard | uspeak_any",
               "uptake ~ heard + bin | uspeak_any + bin")
terms    <-  c( "heard",  "(Intercept)")


iv_out <- sapply(formulae, function(x)
            iv_robust(formula = as.formula(x), data = dat_robust, weights = IPW_any, se_type = "stata"))


out <- lapply(1:2, function(i) {
                      tidy_out  <-   tidy(iv_out[[i]])
                      index <- which( tidy_out$term %in% terms) 
                      tidy_out <- tidy_out[index, ]
                      
                      stars <- sapply(tidy_out[ ,"p.value"] , 
                                      function(p) {ifelse(p <= .01, "***", 
                                                          ifelse(p<= .05, "**", 
                                                                 ifelse(p<= .1, "*", "")))})
                      
                      
                       a <-  rbind( c(paste0(round(tidy_out[1 ,"estimate"], 5),stars[1]),
                                     paste0(round(tidy_out[2 ,"estimate"], 3),stars[2])),  
                                   c( paste0("(", round(tidy_out[1,"std.error"],3) , ")") , 
                                      paste0("(", round(tidy_out[2,"std.error"],3) , ")")) )
                      
                      colnames(a) <- tidy_out[,1]
                      
                      a})



tab <- do.call(rbind, lapply(terms, function(x)
  sapply(1:2, function(i) {
    if(x  %in% colnames(out[[i]]) ) c (out[[i]][,x], "" )
    else rep("",3)} )) )


tab  <- cbind (c("Knowledge of uSpeak","","", "Constant","","", "Block FEs", "N"),
                rbind(tab, 
                     c("no", "yes"),
                     sapply(1:2, function(i) iv_out[[i]]$N)))

```



```{r compliance_print2, include=FALSE, echo=FALSE, warning=FALSE, message=FALSE}


xtab <- xtable(tab, caption ="Second Stage for Constituency MPs only", align = c("l","l", rep("c",2)) , label = "tab:SeconStage")
addtorow <- list()
addtorow$pos <- list(0,8)
addtorow$command <- 
  c(paste0("
          &\\multicolumn{2}{c}{\\textit{Dependent variable:}}\\\\\n", 
          "\\cline{2-3} \\\\\n", 
          " &\\multicolumn{2}{c}{Uptake}\\\\\n",
            "& (1) & (2)\\\\\n"),
            "\\hline \\hline  
            \\textit{Note:} 
              & \\multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\\\\n" 
)



fileConn<-file(paste0(output_folder, "/tab_compliance2.tex"))
writeLines( 
  print.xtable(xtab,
               include.rownames = FALSE, 
               include.colnames = FALSE,
               booktabs = TRUE,
               add.to.row = addtorow, 
               comment = FALSE,
               hline.after=c(-1, -1, 0, 6),
               table.placement = "!htbp",
               caption.placement = "top"), 
  fileConn)
close(fileConn)
    
```
   
\input{"02_figures/tab_compliance2.tex"}


   

**Invitational Effects**

We turn to explore the possibility of a second design effect; namely, that the marketing tools used to inform citizens about a new service or program likely have (unintentional) invitational effects. Recall that the two experiments differed in their mode of marketing: whereas the scaled-up national program used 30 seconds radio ads, in the FFE, respondents were invited by enumerators to contact their MP in the context of an in-person survey. As mentioned, direct personal invitation may have an empowering effect, or it may signal greater government responsiveness. Multiple logics could underpin these effects---an invitation could in principle change a voter's beliefs about how much the politician cares about their welfare---as well as about the politician's knowledge of citizens' preferences. As further discussed below, if such invitational effects operate differently for marginalized and non-marginalized populations, this could account for the differences in observed flattening effects. 

To assess the role personal invitations plays in the decision to politically engage using an ICT platform, we implemented a third (mechanism) experiment. To do so, we made use of an existing SMS platform, UBridge, which has been operating in Arua district since late 2014. UBridge was developed in partnership between UNICEF's \href{http://ureport.ug/}{Ureport} platform and Uganda's Governance, Accountability, Participation and Performance \href{http://www.rti.org/page.cfm?obj=BE95C6d2-B458-6596-7972754060938701}{[GAPP]} project.^[With some loss in external validity, our design aims to keep the *treatment compliance* effect constant by focusing on respondents in the UBridge system. We hope that parsing the outcome compliance effect will be the focus of future studies.] Unlike uSpeak, which connects citizens with national MPs, UBridge was designed to open a new channel of communication from citizens to local government officials to specifically report problems of public service delivery. UBridge was launched as a pilot study in over 100 villages across Arua. A study evaluating the effect of getting access to the UBridge system is underway and is not the subject of this paper. At the time of our `mechanism' experiment, UBridge had $4,568$ registered users, out of which $2,720$ were explicitly verified by the research team.^[We verified the identity of registered users through a call center that we set up with the help of Innovation for Poverty Action, Uganda.] 

On June 13, 2015, UBridge conducted a baseline poll using a robocall system asking users about their attitudes toward budgetary processes. The key outcome of interest is a binary variable that receives the value of 1 if the UBridge user responded to the poll, and 0 otherwise. Of the $2,720$ verified users, 12% responded to the opinion poll and shared their views with UBridge. To explore the role of direct invitations on levels of ICT-based political engagement, we asked UBridge to run a modified version of their baseline poll but now experimentally introducing a modest variation in their outreach activities. All users would be invited to participate in an opinion poll regarding taxation, similar to the previous UBridge poll. In a randomly selected treatment group, however, UBridge preceded the call with a set of (blast) text-messages that explicitly invited participants to take part in the weekend poll and that highlighted the importance of individual responses. Further details on the block randomization used in this experiment, as well as the full text of the treatment text-messages, are provided in the Supplementary Information, Section 7. 

Our primary measure is the response (or non-response) by UBridge users to the weekend opinion poll. The encouragement text-messages were delivered on 24, 25, and 26 June 2015, and the poll took place on 26 June. We estimate average treatment effects using a regression that accounts for block fixed effects. Our analysis takes account of the variables used for blocking but introduces no further controls. Our primary regression uses only the verified subset of UBridge users, whereas our secondary analysis includes all registered users whether or not they have been positively verified. 

Results, reported in Table \ref{tab:MechanismOLS} (column 1), suggest that invitation had a large positive effect on response rate: 2 percentage points from a base rate of 9.4% for the control group (though it did not have a differntial effect by gender: columns 2-3). These results are consisted with findings reported by @GrossmanMichelitchSantamaria2015 in a similar context, and by @dale2009don and @malhotra2011text in the USA. We note that even though the invitation tested in the mechanism experiment was relatively weak (three text-messages) compared to the in-person invitation used in the FFE, it was able to increase participation rates by over 20%. The evidence at hand allows at to conclude that, consistent with insights from the voter mobilization literature [e.g., @green2008get], more personal invitations can have a powerful effect on rates of participation. We can further conclude that, at least in low-income countries with characteristics similar to Uganda, short radio ads likely represent a marketing strategy that is too impersonal to mobilize large-scale participation.    



```{r, include=FALSE, echo = FALSE}

# Check randomization implemented correctly
#table( exp_results$Z, exp_results_old$Z)

# Rescale two vars to be consistent with preanalysis code
# Normalize for interaction term to allow easier interpretation
 exp_results %<>% mutate(gender  = gender -1,
                        gender_norm1 = gender - mean(gender[gender <= 1], na.rm = TRUE), 
                        gender_norm2 = gender - mean(gender[verified==1 & gender <= 1], na.rm = TRUE),
                        verified = verified - 1,
                        flattening_norm1 = Z*gender_norm1,
                        flattening_norm2 = Z*gender_norm2)
                

#table(exp_results$gender, exp_results_old$gender)
  
#table(exp_results$verified, exp_results_old$verified)


table(exp_results$flattening)

# Dummy data and Analysis
exp_results$flattening <- exp_results$flattening_norm1
M1 <- lm(CompleteSurvey ~ gender_norm1+ Z +flattening + as.factor(block),
         data = exp_results[exp_results$verified==1 & exp_results$gender <= 1, ])
exp_results$flattening <- exp_results$flattening_norm2

M2 <- lm(CompleteSurvey ~ gender_norm2+ Z + flattening + as.factor(block), 
         data = exp_results[exp_results$gender <= 1, ])

M0 <- lm(CompleteSurvey ~ Z + as.factor(block), data = exp_results[exp_results$verified==1 & exp_results$gender <= 1, ])

```


```{r, echo=FALSE, comment=NA, results='asis'}

stargazer(M0, M1, M2, type = "latex", title="Mechanism Experiment", header=FALSE, 
         column.labels=c("Base", "Primary", "Secondary"), font.size ="small",
         covariate.labels=c("Invitation", "Flattening (Male*Invitation)"), label="tab:MechanismOLS",
         omit = c("block", "gender", "gender_norm1", "Constant"), omit.stat=c("f", "ser"), 
         notes="\\textit{Male}  normalized to have 0 mean. *$p<0.1$", dep.var.labels   = "Provided policy input", notes.append = FALSE)
```


```{r, echo = FALSE}
participation_results <- function(G=0, X=0) {
  round(mean(dplyr::filter(exp_results, gender==G & Z==X & verified ==1)$CompleteSurvey, na.rm = TRUE), 3)*100}

```






# Conclusion {#sec:discussion}

This study integrates three related field experiments designed to assess whether innovations in information communication technologies (ICTs) can be harnessed to improve weak political communication, prevalent in many low-income countries. Evidence from a framed field experiment (FFE) conducted before rolling out a national program suggested not only that there is underlying demand to contact representatives using mobile technology, but also that ICTs have a genuine potential to increase levels of political engagement in a way that flattens access for marginalized populations. By contrast, when brought to scale using a natural field experiment (NFE) implemented nationwide, we find significantly lower levels of citizen engagement, with marginalized populations especially refraining from using the ICT platform to raise voice. These results have implications for theory, policy, and research methodology. 

Our study contributes primarily to our understanding of the promises and pitfalls of ICT-based political communications, at least in the context of low-income countries. Consider four findings that help account for the weak usage of Uganda's national parliamentary communication system.

First, we learned from the FFE that a nontrivial share of citizens, including especially marginalized citizens, want to communicate with their representatives in government using new technological innovations, and are willing to pay to do so. This stands in contrast to accounts of disengagement as reflecting alienation or apathy. We also know that many---though clearly not all (see APPENDIX FIGURE)---have the capacity and means to do so. The results from the FFE support the idea that mobile technology could, under the right conditions, change the relationships between voters and representatives in the developing world. An examination of the scaled-up system alone would have masked this core insight.

Second, from an experimental manipulation in the NFE we found little evidence that the differences across field experiments is due simply to scale effects. Specifically, we do not find evidence that system usage is a strategic response to how many others are contacting their MP via the ICT platform. We believe that improving our understanding of the conditions under which constituents might view IT-based communication with public officials as complements or substitutes, is an important research avenue for future work to explore. 

Third, complier analysis suggests that although the "first-stage" of treatment using radio ads was not large, given the scale of the experiment it was far from being weak enough to account for the low engagement. Indeed, the estimated complier effect in the NFE is about 4% of the effect on usage observed in the FFE. 

Fourth, from the `mechanism' experiment we learned that there is a reasonably strong responsiveness to personal invitations to engage politically when interest articulation is at stake, but we do not find evidence of the kind of differential responsiveness that would be needed to account for differences in flattening effects across experimental settings.   

These findings suggest that the disappointing results of the uSpeak program are *not driven by weak demand*. In contrast, survey evidence suggests weaknesses in the marketing of the system itself. While a relatively large number of constituents were exposed to the radio ads, citizens had difficulty retaining and internalizing the information needed for acting conditional on heating about the service. Moreover, our analysis suggests that *agent effects*---i.e., that the change in the identity of the implementer, which was easily observed by experimental subjects---likely have been very consequential. Specifically, we find strong evidence that general trust in the responsiveness of politicians is preventing engagement but is also rational. Interestingly in our case, agent effects do not stem from motivation differences between implementers (as for example identified by @berge2012business), but rather from the way agent identities interact with citizen expectations. In Uganda, as in many electoral authoritarian regimes---the most common regime type in Africa---low levels of political efficacy are discouraging political action; ICT innovations, by themselves, cannot force non-responsive politicians to become responsive.      

With the multiple pieces of evidence available to us we infer that the failure of the nationwide program is not simply a function of weak demand on the part of citizens or to the weakness of marketing mechanisms but is a function of larger inequalities. Some of these, such as unevenness in receipt of invitations from Parliament, might be addressable through improved interventions. However, some reflect more fundamental weaknesses in the broader political system, most notably cynicism regarding the competence and motivations of politicians, which Parliament likely cannot address easily through technological innovation. 

Our study also has broader implications for research methodology, and especially for the extent to which outcomes of scaled-up programs can be gleaned from results of controlled small-scale interventions. The literature on scaling up has largely focused on assessing the extent to which experimental estimates in one context apply in another. Some of this literature highlights the problems in using a small handful of studies as the basis for inferences to different contexts [@OS28082015]. Other work highlights the costs of extrapolation. Comparing non-experimental and experimental estimates that rely on the same data, @pritchett2013context conclude that non-experimental estimates with the same subject population can better predict treatment effects as compared to experimental results from other contexts, because contextual variation can drive bigger differences in the estimated effectiveness of a program than selection bias.^[There is a growing literature debating the tradeoffs associated with different approaches to generating out-of-sample predictions based on experimental data.  @hotz2005predicting suggest using subject's observed characteristics as predictive of treatment effects independent of context. @gechter2015generalizing proposes a method that uses differences in outcome distributions for individuals with the same characteristics and treatment status in the original study and the context of interest to learn about unobserved differences across context.]

Importantly, there should not have been great differences in the subject population between the FFE and the NFE. The FFE was offered to a random sample of subjects from every constituency in Uganda, while the scaled-up program was offered to a random sub-sample of 186 constituencies, out of a total of 238 (see Supplementary Information, Figure 4). 

Thus the differences we observe draw attention to a distinct problem, largely overlooked by the extant external validity literature: the external validity across the nuts and bolts of *interventions* and not necessarily across *populations*.^[Some of our divergent findings do relate to endogenous changes in populations as a consequence of the factors outside the control of the research team. For example, one reason we do not find price effects in the scaled-up program similar to those found in the controlled experiment, can be attributed to the fact that the national intervention was taken up by relatively well off and engaged citizens who are unlikely to be sensitive to a small price subsidy.] This kind of validity problem is especially critical when lessons from carefully controlled small-scale studies are intended to inform policies to be implemented at a larger scale. Our results provide a cautionary tale for researchers and policy makers seeking to make such claims. 

In our analysis, we identified several distinct reasons why outcomes of experiments may fail to replicate when brought to scale. These include already well-appreciated effects that relate directly to scale (see also @deaton2010instruments on general equilibrium effects). In addition, we highlight possible effects related to the changing agents involved when interventions are implemented at scale (see also @bold2013scaling on capacity and motivation of implementing organizations), and we identify differences related to details in the design between controlled interventions and interventions implemented in the political wild, of the form that may be relevant for other studies. 

<!-- % Using an array of data sources---including two national representative surveys, a callback user survey and a third `mechanism' experiment---we find evidence consistent with the idea that agent effects, but especially design details are consequential. Specifically, we find that due to its strong delivery method, the more controlled framed experiment achieved a level of first-stage compliance that could not be achieved by the national program, even one that relied on an aggressive campaign. Our `mechanism' experiment provides further evidence that personal invitations can drive up participation substantially, although it does not find evidence that this feature is what explains differences in flattening effects.  -->

Ironically when design details matter, a first response is to resort to controlled conditions to get those details right. This might be an appropriate approach when seeking to control for all factors but a manipulated variable of interest, but one core lesson from our study is that the importance of those details may only became apparent once researchers' control is removed. 



\clearpage \newpage

# ACKNOWLEDGMENTS
We would like to thank the National Democratic Institute, especially  Simon Osborn, Ivan Tibemanya and Linda Stern, for a fruitful collaboration. We also thank Melina Platas and Jonathan Rodden for their generous support in facilitating our ``mechanism'' experiment in Arua and Jan Pierskalla and Laura Paler for allowing us to add questions about uSpeak to their national representative survey on oil in Uganda. This study benefited tremendously from comments from Michael Findley and Daniel Nielson and from participants at EGAP's workshop meeting at Rice University, and seminars at John Hopkins, Georgetown and Harvard University.

# BIOGRAPHICAL STATEMENT

Guy Grossman is an associate professor of political science at the University of Pennsylvania, PA, 19104.  

Macartan Humphreys is a professor of political science at Columbia University, NY 10025 and a research director at WZB Berlin, Berlin 10785. 

Gabriella Sacramone-Lutz is a PhD candidate in Political Science at Columbia University, NY 10025.

# References  {-}


